Retrieval using Passage Retrieval , Connectivity

نویسندگان

  • Franco Crivellari
  • Massimo Melucci
چکیده

This report describes the participation at the Web track of the TREC-(Italy). TREC-9 has been our rst participation to TREC and, then, to the Web track. In the following, we describe the experimental approach we have chosen, the research hypotheses and questions, the problems we encountered, the results we reached and our conclusions. We consider this experience as the rst step towards the participation to the next Web tracks. The approach we have taken to address the problems and the research questions regards both the scientiic side and the implementation side. As regards to the scientiic side, we employed an experimental approach that mixes both classical advanced information retrieval (IR) techniques, and connectivity-based algorithms for IR on the Web. Figure 1 depicts the whole process being described below. Speciically, we have chosen those classical IR techniques, i.e. passage retrieval and blind relevance feedback, which have proven to be eeective to produce good retrieval results 1]. Moreover, we are interested to test whether the connectivity-based algorithms, which have been proposed in diierent Web contexts, are eeective tools to improve classical techniques. As regards to the implementation side, we developed in-house software and employed other software modules that are publicly available. 1 passage extraction baseline connectivity based algorithms similarity computation based algorithms using similarity connectivity connectivity based result baseline result weighted connectivity based result documents passages ranked documents link data hubs and authorities similarity data weighted hubs and authorities Web Interdata passages robot Tidy Figure 1: The experimental process. Bold text refers to the submitted runs. Baseline. First 10 passages { title and paragraphs { are extracted from each document and indexed using a stop-list augmented with Web stopwords, the Porter's stemming algorithm, and by keeping non-stemmed words; for example, the word \White" has been stored together with \white". Title-only and title-description queries are automatically generated, and indexed as passages did. For each query, top 10; 000 passages are retrieved and ranked by F-4 2]. The lists of retrieved passages are reweighted through blind relevance feedback by considering top 100 passages as relevant. The lists of newly 10; 000 retrieved passages are mapped to retrieved documents. The document score is the sum is the of the scores of the mapped passages. Connectivity-based algorithm. A modiied version of the HITS (Hyper-link Induced Topic Search) algorithm is applied on the provided link les, where the link weight is the baseline score; Similarity-based algorithm. In-and …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Boosting Passage Retrieval through Reuse in Question Answering

Question Answering (QA) is an emerging important field in Information Retrieval. In a QA system the archive of previous questions asked from the system makes a collection full of useful factual nuggets. This paper makes an initial attempt to investigate the reuse of facts contained in the archive of previous questions to help and gain performance in answering future related factoid questions. I...

متن کامل

Semiautomatic Image Retrieval Using the High Level Semantic Labels

Content-based image retrieval and text-based image retrieval are two fundamental approaches in the field of image retrieval. The challenges related to each of these approaches, guide the researchers to use combining approaches and semi-automatic retrieval using the user interaction in the retrieval cycle. Hence, in this paper, an image retrieval system is introduced that provided two kind of qu...

متن کامل

Element Retrieval Using a Passage Retrieval Approach

Element and passage retrieval systems are able to extract and rank parts of documents and return them to the user rather than the whole document. Element retrieval is used to search XML documents and identify relevant XML elements, while passage retrieval is used to identify relevant passages. This paper reports a series of experiments on element retrieval, using a general passage retrieval alg...

متن کامل

IIT TREC 2007 Genomics Track: Using Concept-Based Semantics in Context for Genomics Literature Passage Retrieval

For the TREC-2007 Genomics Track [1], we explore unsupervised techniques for extracting semantic information about biomedical concepts with a retrieval model for using these semantics in context to improve passage retrieval precision. Dependency grammar analysis is evaluated for boosting the rank of passages where complementary subject/object concept pairs can be identified between queries and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000